Closed
Conversation
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. |
Collaborator
|
Note that we should update the list of supported algorithms in the readme when this lands |
Merged
dsikka
pushed a commit
that referenced
this pull request
Apr 21, 2025
SUMMARY: Addition of [`AWQModifier`](https://arxiv.org/pdf/2306.00978), based on [AutoAWQ implementation](https://github.com/casper-hansen/AutoAWQ/blob/main/awq/quantize/quantizer.py#L28). Should be reviewed/merged in conjunction with vllm-project/compressed-tensors#269 Replaces #181 and #824 TEST PLAN: Some unit tests included, but as this was mostly a port from AutoAWQ, we validated the code by ensuring we could reproduce the evaluation metrics in Table 4 of [the paper](https://arxiv.org/pdf/2306.00978). We achieve the following wikitext PPL scores: Llama-2 7B Group 128: 1. Paper: 5.60 2. AutoAWQ: 5.615 3. This implementation: 5.612 4. we match what the paper reports for just RTN -- 5.73 5. We get reasonable results for channel-wise -- 6.788. AutoAWQ errors out for this (setting "q_group_size": -1 in the quant_config), and results not reported in paper. Llama-2 13B Group 128: 1. We match the results of AutoAWQ and the results shown in the paper: 4.97 2. We match what the paper reports for just RTN -- 4.984 NOTE: We are excluding the clipping logic in this implementation, if we want to add it we should add it as another modifier, they are mutually exclusive and the data model for AWQ doesn't align well with clipping. That might be the reason for the slight deviation of results reported in the paper and in our implementation --------- Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SUMMARY:
"please provide a brief summary"
TEST PLAN:
"please outline how the changes were tested"